Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge
نویسندگان
چکیده
Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering. For static graph partitioning, 10–20 iterations of LOBPCG without preconditioning result in ̃10x error reduction, enough to achieve 100% correctness for all Challenge datasets with known truth partitions, e.g., for graphs with 5K/.1M (50K/1M) Vertices/Edges in 2 (7) seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python code. Our Python code 100% correctly determines 98 (160) clusters from the Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using 10GB (50GB) of memory. Our single-precision MATLAB code calculates the same clusters at half time and memory. For streaming graph partitioning, LOBPCG is initiated with approximate eigenvectors of the graph Laplacian already computed for the previous graph, in many cases reducing 2-3 times the number of required LOBPCG iterations, compared to the static case. Our spectral clustering is generic, i.e. assuming nothing specific of the block model or streaming, used to generate the graphs for the Challenge, in contrast to the base code. Nevertheless, in 10-stage streaming comparison with the base code for the 5K graph, the quality of our clusters is similar or better starting at stage 4 (7) for emerging edging (snowballing) streaming, while the computations are over 100–1000 faster.
منابع مشابه
Node Clustering in Graphs: An Empirical Study
Modeling networks is an active area of research and is used for many applications ranging from bioinformatics to social network analysis. An important operation that is often performed in the course of graph analysis is node clustering. Popular methods for node clustering such as the normalized cut method have their roots in graph partition optimization and spectral graph theory. Recently, ther...
متن کاملModern Preconditioned Eigensolvers for Spectral Image Segmentation and Graph Bisection Workshop on Clustering Large Data Sets Third IEEE International Conference on Data Mining (ICDM 2003)
Known spectral methods for graph bipartition and image segmentation require numerical solution of eigenvalue problems with the graph Laplacian. We discuss several modern preconditioned eigenvalue solvers for computing the Fiedler vectors of large scale eigenvalue problems. The ultimate goal is to find a method with a linear complexity, i.e. a method with computational costs that scale linearly ...
متن کامل18.S096: Community dection and the Stochastic Block Model
Community detection in a network is a central problem in data science. A few lectures ago we discussed clustering and gave a performance guarantee for spectral clustering (based on Cheeger’s Inequality) that was guaranteed to hold for any graph. While these guarantees are remarkable, they are worst-case guarantees and hence pessimistic in nature. In what follows we analyze the performance of a ...
متن کاملSpectral Partitiong in a Stochastic Block Model
In this lecture, we will perform a crude analysis of the performance of spectral partitioning algorithms in what are called stochastic block models or a planted partition model. The name you choose largely depends on your community and application. As we are especially interested today in partitioning, we will call it the planted partition model. In this model, we build a random graph that has ...
متن کاملSpectral clustering and the high-dimensional Stochastic Block Model
Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasibl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1708.07481 شماره
صفحات -
تاریخ انتشار 2017